What is AutoGen?
Microsoft Research's framework for building multi-agent AI systems that can reason, collaborate, and execute code.
AutoGen is an open-source framework from Microsoft Research that lets you build systems where multiple AI agents work together to solve complex tasks. Think of it as a runtime for AI teamwork — agents can talk to each other, use tools, write and execute code, and ask humans for input.
Unlike workflow tools (n8n, Zapier) that execute deterministic steps, AutoGen agents reason autonomously. They decide how to solve a problem, not just follow a pre-written path.
Why does AutoGen exist?
The Big Picture
AssistantAgent, UserProxyAgent, and GroupChat. This course uses v0.4 patterns.AutoGen vs. The World
| Framework | Paradigm | Best For |
|---|---|---|
| AutoGen | Autonomous multi-agent conversation | Complex reasoning, code generation, research tasks |
| LangGraph | Stateful graph-based workflows | Fine-grained control over agent state & branching |
| CrewAI | Role-based agent teams | Business automation with defined roles |
| n8n | Deterministic workflow automation | Integrating SaaS tools with predictable logic |
Core Concepts
The building blocks: agents, conversations, termination, and the LLM config pattern.
The Two Primary Agents
LLM Configuration
import autogen llm_config = { "config_list": [{ "model": "gpt-4o", "api_key": "sk-..." }], "temperature": 0.1, "cache_seed": 42, } config_list = autogen.config_list_from_json("OAI_CONFIG_LIST")
Conversations & Termination
# 1. Max turns user_proxy.initiate_chat(assistant, max_turns=5) # 2. Keyword assistant = autogen.AssistantAgent(system_message="...reply TERMINATE when done") # 3. Custom function def my_term(msg): return "task_complete" in (msg["content"] or "").lower() user_proxy = autogen.UserProxyAgent(is_termination_msg=my_term)
Human Input Modes
| Mode | Behavior | Use Case |
|---|---|---|
| ALWAYS | Asks human at every step | Interactive sessions, demos |
| TERMINATE | Asks human only on termination | Approval gate at the end |
| NEVER | Fully autonomous | Production pipelines |
Your First Agents
Build a working two-agent system from 15 lines of Python.
Installation
pip install pyautogen pip install pyautogen[docker] # for Docker sandbox export OPENAI_API_KEY="sk-..."
Hello World: Two-Agent System
import autogen llm_config = {"config_list": [{"model": "gpt-4o", "api_key": "sk-..."}]} assistant = autogen.AssistantAgent( name="assistant", llm_config=llm_config, system_message="You are a Python expert. When done, reply: TERMINATE" ) user_proxy = autogen.UserProxyAgent( name="user_proxy", human_input_mode="NEVER", is_termination_msg=lambda x: "TERMINATE" in (x.get("content") or ""), code_execution_config={"work_dir": "coding", "use_docker": False} ) user_proxy.initiate_chat(assistant, message="Print the first 10 Fibonacci numbers.")
Conversation Patterns
Two-agent, sequential chaining, nested chats, Swarm — and when to use each.
Pattern 1: Two-Agent (Default)
One user_proxy, one assistant. Best for focused single tasks: code generation, Q&A, analysis.
Pattern 2: Sequential Chaining
r1 = user_proxy.initiate_chat(writer, message="Write a blog post about RAG") r2 = user_proxy.initiate_chat(critic, message=f"Review:\n{r1.summary}") r3 = user_proxy.initiate_chat(editor, message=f"Apply feedback:\n{r2.summary}")
Pattern 3: Nested Chats
assistant.register_nested_chats( trigger=user_proxy, chat_queue=[{ "recipient": specialist, "summary_method": "last_msg", "max_turns": 3 }] )
Pattern 4: Swarm (v0.4)
from autogen import SwarmAgent, initiate_swarm_chat triage = SwarmAgent(name="triage", handoffs=["billing", "tech", "sales"]) initiate_swarm_chat(initial_agent=triage, agents=[triage, billing, tech, sales], messages="Can't access account after payment failed")
Tool Use & Function Calling
Give agents real-world capabilities: web search, database queries, API calls.
Defining Tools with Decorators
@user_proxy.register_for_execution() @assistant.register_for_llm(description="Get current weather for a city") def get_weather(city: str) -> str: return f"Weather in {city}: 22°C, sunny" @user_proxy.register_for_execution() @assistant.register_for_llm(description="Search the web for current info") def web_search(query: str) -> str: return f"Results for '{query}': ..."
How Tool Calling Works
register_for_llm and user_proxy needs register_for_execution?Group Chat
Orchestrate 3+ specialized agents collaborating on a shared task.
planner = autogen.AssistantAgent(name="Planner", llm_config=llm_config, system_message="Break tasks into subtasks and assign them.") coder = autogen.AssistantAgent(name="Coder", llm_config=llm_config, system_message="Write high-quality Python. No prose, just code.") critic = autogen.AssistantAgent(name="Critic", llm_config=llm_config, system_message="Review code for bugs, edge cases, style.") groupchat = autogen.GroupChat( agents=[user_proxy, planner, coder, critic], messages=[], max_round=12, speaker_selection_method="auto" ) manager = autogen.GroupChatManager(groupchat=groupchat, llm_config=llm_config) user_proxy.initiate_chat(manager, message="Build a FastAPI sentiment endpoint")
Speaker Selection
| Method | How | Best For |
|---|---|---|
auto | LLM picks most relevant agent | General purpose |
round_robin | Agents take turns in order | Structured loops |
random | Random each turn | Diverse perspectives |
| custom fn | Your function decides | Complex routing |
max_round limits and concise system messages.Memory & RAG
Give agents long-term memory with vector stores and retrieval-augmented generation.
Built-in RAG: RetrieveUserProxyAgent
from autogen.agentchat.contrib.retrieve_user_proxy_agent import RetrieveUserProxyAgent rag = RetrieveUserProxyAgent(name="rag", retrieve_config={ "task": "qa", "docs_path": ["./docs/"], "model": "gpt-4o", "vector_db": "chroma", "collection_name": "my_docs", "get_or_create": True, }, code_execution_config=False, human_input_mode="NEVER") rag.initiate_chat(assistant, problem="What does our API return on auth failure?")
Memory Architecture Patterns
AutoGen vs. Other Frameworks
When to use AutoGen, when to use something else, and how to combine them.
| Scenario | Best Pick | Why |
|---|---|---|
| AI writes & debugs code autonomously | AutoGen | Code execution loop + multi-agent review |
| Research: gather, analyze, synthesize | AutoGen | Autonomous reasoning + tool use |
| Strict step-by-step workflow | LangGraph / n8n | Deterministic control flow |
| Role-based teams (PM, dev, QA) | CrewAI | First-class role/goal/task primitives |
| SaaS integration automation | n8n | 500+ no-code connectors |
| Complex RAG + reasoning | LlamaIndex + AutoGen | Best of both |
• Flexible conversation patterns
• Human-in-the-loop at any granularity
• Active research + rapid updates
• Token costs escalate in group chats
• Multi-agent debugging is hard
• v0.4 API still maturing
Architecture
Why multi-agent for a pizza bot? What each agent does. How this differs from Dialogflow CX or Amazon Lex.
You've already built this bot in Dialogflow CX (state machine + NLU) and Amazon Lex (slots + Lambda). The AutoGen version works completely differently — instead of a pre-wired state machine, you have agents that reason their way through the conversation.
No pages, no routes, no explicit state transitions. The agents decide what to ask, when to validate, and when to submit.
Dialogflow CX vs AutoGen — Same Bot, Different Soul
| Dimension | Dialogflow CX | AutoGen |
|---|---|---|
| Flow control | State machine (Pages → Routes) | Agents reason and decide |
| Input handling | Slot filling + entity types | LLM extracts intent + entities |
| Validation | Regex on slot values | Validator agent checks order object |
| Flexibility | Rigid — changes need flow redesign | High — change system prompt |
| Debugging | Visual flow trace in console | Agent message log, print statements |
System Architecture
Conversation Flow
The Agents
Three agents with distinct roles. The soul of the system lives in the prompts.
OrderAgent System Prompt (key excerpt)
Your job flow:
1. Greet the customer warmly
2. Call get_menu() once (silently) to know what's available
3. Collect: pizza type, size, crust (default: thin), extras, removals
4. Call calc_price() once you have all details
5. Call validate_order() — read VALID or INVALID response
6. If VALID → present summary + ask for confirmation
7. If INVALID → fix issues, recalculate, re-validate
8. On confirmation → call submit_order()
9. Reply with order ID + ETA, then: TERMINATEProject Setup
Install dependencies, create the project structure, configure your API key.
File Structure
├── main.py # entry point — two modes: automated + interactive
├── agents.py # agent definitions, prompts, tool registration
├── tools.py # get_menu, calc_price, submit_order
├── menu.py # pizza data — prices, toppings, crusts
├── config.py # LLM_CONFIG from .env
├── .env # OPENAI_API_KEY=sk-...
└── requirements.txt
Installation
python -m venv .venv source .venv/bin/activate pip install pyautogen python-dotenv
import os from dotenv import load_dotenv load_dotenv() LLM_CONFIG = { "config_list": [{"model": "gpt-4o", "api_key": os.getenv("OPENAI_API_KEY")}], "temperature": 0.2, "cache_seed": None, }
Menu & Tools
The pizza data and the four tool functions agents use to interact with it.
menu.py
PIZZAS = { "margherita": {"base_price": {"small":10,"medium":14,"large":18,"xl":22}, "default_toppings": ["mozzarella","tomato_sauce","basil"]}, "pepperoni": {"base_price": {"small":12,"medium":16,"large":20,"xl":24}, "default_toppings": ["mozzarella","tomato_sauce","pepperoni"]}, "bbq_chicken": { ... }, "veggie": { ... }, } EXTRA_TOPPINGS = {"extra_cheese":2.0, "mushroom":1.5, "bacon":2.5, ...} CRUSTS = ["thin", "thick", "stuffed", "gluten_free"] CRUST_UPCHARGE = {"stuffed": 3.0, "gluten_free": 2.0}
Key Tool Functions
def get_menu() -> str: """Return full menu as JSON string.""" return json.dumps({"pizzas": ..., "extra_toppings": ..., "crusts": ...}) def calc_price(pizza_type: str, size: str, crust: str="thin", extra_toppings: list=None) -> str: """Calculate total price. Returns JSON with total + breakdown.""" total = PIZZAS[pizza_type]["base_price"][size] + CRUST_UPCHARGE.get(crust, 0) for t in (extra_toppings or []): total += EXTRA_TOPPINGS[t] return json.dumps({"total": round(total, 2), "breakdown": {...}}) def submit_order(pizza_type, size, crust, extra_toppings, remove_toppings, total_price, customer_name="Guest") -> str: """Submit order to kitchen. Returns order_id + ETA.""" order_id = f"ORD-{random.randint(10000,99999)}" # In production: write to DB, call kitchen API, send SMS... return json.dumps({"order_id": order_id, "eta": eta, "message": "Confirmed!"})
Order Agent
Creating the OrderAgent and registering tools with the two-decorator pattern.
def register_tools(order_agent, validator_agent, user_proxy): @user_proxy.register_for_execution() @order_agent.register_for_llm(description="Fetch the full pizza menu") def _get_menu() -> str: return get_menu() @user_proxy.register_for_execution() @order_agent.register_for_llm(description="Calculate total price") def _calc_price(pizza_type: str, size: str, crust: str="thin", extra_toppings: list=None) -> str: return calc_price(pizza_type, size, crust, extra_toppings) @user_proxy.register_for_execution() @order_agent.register_for_llm(description="Validate order before confirming with customer") def _validate_order(pizza_type: str, size: str, crust: str, extra_toppings: list, remove_toppings: list, price: float) -> str: # Spins up a one-shot ValidatorAgent chat, returns "VALID" or "INVALID: ..." order_str = json.dumps({"pizza_type": pizza_type, "size": size, ...}) vproxy = autogen.UserProxyAgent(name="vp", human_input_mode="NEVER", is_termination_msg=lambda x: True, code_execution_config=False) vproxy.initiate_chat(validator_agent, message=f"Validate:\n{order_str}", max_turns=1, silent=True) history = validator_agent.chat_messages.get(vproxy, []) for m in reversed(history): if m["role"] == "assistant": return m["content"] @user_proxy.register_for_execution() @order_agent.register_for_llm(description="Submit confirmed order. Call ONLY after customer says yes.") def _submit_order(pizza_type, size, crust, extra_toppings, remove_toppings, total_price, customer_name="Guest") -> str: return submit_order(pizza_type, size, crust, extra_toppings, remove_toppings, total_price, customer_name)
@ decorators must be applied at definition time in the function scope so AutoGen can capture correct references and bind them to the right agents.Validator Agent
Called as a tool by the OrderAgent — a one-shot LLM check on the assembled order.
You are a strict pizza order validator for PizzaLab.
Check ALL of the following:
1. pizza_type is one of: margherita, pepperoni, bbq_chicken, veggie
2. size is one of: small, medium, large, xl
3. crust is one of: thin, thick, stuffed, gluten_free
4. All extra_toppings are on the extras menu
5. Price is plausible for size + extras
Respond with EXACTLY one of:
- "VALID"
- "INVALID: [issue 1]; [issue 2]"
No extra text. Be terse.register_nested_chats — it fired on every turn and couldn't reliably find the order JSON. The fix: validate_order() is a regular tool that spins up a one-shot initiate_chat internally. Explicit, deterministic, visible in the tool log.Wiring It Together
main.py assembles all agents and handles both automated testing and real interactive sessions cleanly.
# KEY LESSON: human_input_mode="ALWAYS" prompts on EVERY turn # including tool-call turns. Fix: use NEVER + a smart reply function. def smart_human_reply(recipient, messages, sender, config): last = messages[-1] if messages else {} role = last.get("role", "") if role == "tool": # tool result turn — pass through return False, None if last.get("tool_calls"): # tool call turn — pass through return False, None # ← CRITICAL: prevents 400 BadRequestError # Genuine conversational turn — ask the human human_input = input("You: ").strip() if human_input.lower() == "exit": return True, "exit" return True, human_input user_proxy = autogen.UserProxyAgent(name="customer", human_input_mode="NEVER", ...) user_proxy.register_reply(trigger=[autogen.AssistantAgent, None], reply_func=smart_human_reply, position=0)
Three Bugs We Fixed Building This
| Bug | Root Cause | Fix |
|---|---|---|
| Infinite validation loop | register_nested_chats fires on every turn | validate_order() as a regular tool |
| TypeError: NoneType not iterable | content=None on tool-call messages | (x.get("content") or "") |
| 400 BadRequestError | Empty reply inserted between tool_call and tool_result | Guard on last.get("tool_calls") |
Running & Testing
Two ways to run the bot. What to expect. What to test.
Run Commands
# Automated test (scripted replies, no typing needed) python main.py # Interactive mode (you type each customer reply) INTERACTIVE=1 python main.py
Interactive Session Preview
Edge Cases to Test
| Input | Expected |
|---|---|
| "I want a Hawaiian pizza" | Agent says not on menu, suggests alternatives |
| "XL margherita stuffed crust + pineapple + bacon" | Prices stuffed upcharge ($3) + 2 extras ($4) correctly |
| "Actually change it to medium" | Recalculates from scratch |
| "No" to confirmation | Agent asks what to change, doesn't submit |
Extensions
Where to take this next — production-grade enhancements.
get_past_orders(customer_id) tool backed by ChromaDB. Agent can offer "Same as last time?"main() in an Azure Function (HTTP trigger). Deploy to Container Apps. Swap OpenAI for Azure OpenAI endpoint.